Basic visualizations (with base R)

0. Preliminary comments

R is extremely powerful when it comes to visualizations.

Here we are only touching the tip of the iceberg with plotting capabilities that come with R.

There are popular packages like “ggplot2” that make it “easier” to make many different types of plots in R.

Personally, I don’t use “ggplot2”. 1- It does not save me that many lines of code, 2- the syntax is a bit different than base R (which means I have to learn an additional “dialect” to make figures).

Base R helps keep codes consistent, packages change over time!

Visualization is VERY important for research, especially before running regressions/models. Understanding data in the rawest form can be very helpful with getting you from A to B.

Learn to appreciate ingredients for analysis. :)

1. Basic scatter and line plots

Let’s import some data

Real GDP since 1947: https://fred.stlouisfed.org/series/GDPC1

Go to ?as.Date for help to convert it to dates.

The line data$date[1] + 1:10 understands it creates a sequence of dates for plotting (x-axis).

  data <- read.csv("GDPC1.csv", stringsAsFactors=F) # point to where the file is
  
  # Convert dates
  head(data)
  class(data$DATE)
  data$date <- as.Date(data[,1], "%Y-%m-%d")
  class(data$date)
  data$date[1] + 1:10 # R understands you are adding days

You can’t do data$DATE[1] + 1:10 because you can’t add numbers to a character. That’s why you need to force as.Date.

?substr is to extract or replace substrings in a character vector.

  substr("abcdefg",1,3)
  
  substr("abcdefg",6,7)
  
  data$year <- as.numeric(substr(data$DATE,1,4))
  data$month <- as.numeric(substr(data$DATE,6,7))
  
  # Create a continuous "time" variable for managing dates
  data$time <- data$year + (data$month-1)/12

Basic plots

With plots, first one is x axis, second one is y-axis.

Argument type allows for different types of plots (lines, steps, bars, overlay of plots and lines).

Lwd characterizes the line width.

# Basic plot: 

  plot(data$date, data$GDPC1, main = "Generic")

  plot(data[,c("date","GDPC1")], main = "Generic, alterate code") 

  # similar, but this one is more text. But gives you label names.
  
  # Lines of different sorts
  
  plot(data[,c("date","GDPC1")], type="l", main = "Lines") # lines

  plot(data[,c("date","GDPC1")], type="s", main = "Steps") # steps

  plot(data[,c("date","GDPC1")], type="h", main = "Bars") # bars

  plot(data[,c("date","GDPC1")], type="o", main = "Points and lines") # overlay points and lines

  # Wider lines
  
  plot(data[,c("date","GDPC1")], type="l", lwd=4, main = "Wider lines") 

  # Change color
  
  plot(data[,c("date","GDPC1")], type="l", lwd=4, col="red", main = "Color and 'ablines' ") 

  # Reference lines

  abline(v=1960) # vertical reference line in 1960; does not work well, why?
  
  abline(v=as.Date("1960-1-1")) # vertical reference line in 1960

  abline(v=seq(as.Date("1960-1-1"),as.Date("2020-1-1"), 365 * 20)) # every 20 years
  
  abline(h=c(5000, 10000, 15000), lty=2) # horizontal dashed line

  # Use numeric time variable
    
  plot(data[,c("time","GDPC1")], type="l", main = "Vertical dashed ablines") # lines
  
  abline(v=seq(1960,2010,10), lty=3) # vertical reference line in 1960

The command abline creates lines in plots. v specifies vertical lines, h specifies horizontal.

Putting the line at v=1960 doesn’t work because it puts it in the ROW in the dataset, not the date itself. Have to make sure where to reference according to dates.

lty specifies the dashes, 1 is solid, 2 is dashed, 3 is more dashes,

2. Customizing plots

Axis 1 is the bottom, axis 2 is LHS, axis 3 is top, axis 4 is RHS.

Las changes the orientation of the text on the axis. las=2 creates horizontal axis labels.

axis(1, las=1, at=seq(1950,2010,20)) specifies exactly WHERE the labels and tick marks should be located.

Adding back in axis and text is a layering process so have to be careful to make sure it’s not ugly.

  # Remove axes and labels...
  
  plot(data[,c("time","GDPC1")], type="l", axes=F, xlab="", ylab="", main = "Adding axes") 
  # lines
  
  # Add customized axis, go to `?axis`
  axis(1)
  axis(2)
  axis(3, las=2)
  axis(4, las=2)

  # Add axis with horizontal labels
  plot(data[,c("time","GDPC1")], type="l", axes=F, xlab="", ylab="", main = "Horizontal labels")
  # lines
  axis(1, las=1)
  axis(2, las=2)

  # Add axis with horizontal labels
  plot(data[,c("time","GDPC1")], type="l", axes=F, xlab="", ylab="", main = "Horizontal labels")
  # lines
  axis(1, las=1, at=seq(1950,2010,20)) 

  # Say we want to add little tick marks every 5 years
  plot(data[,c("time","GDPC1")], type="l", axes=F, xlab="", ylab="" , main = "Adding ticks") 
  # lines
  axis(1, las=1, at=seq(1950,2020,10))
  axis(1, las=1, at=seq(1955,2015,10), tck=-.01, lwd.tick=1, lwd=0, labels=F)
  axis(2, las=2)

  # Add labels to axis: mtext
  plot(data[,c("time","GDPC1")], type="l", axes=F, xlab="", ylab="", main = "Mtext")
  axis(1, las=1, at=seq(1950,2020,10))
  axis(1, las=1, at=seq(1955,2015,10), tck=-.01, lwd.tick=1, lwd=0, labels=F)
  axis(2, las=2)
  mtext("Year", side=1, line=0) # too close to the margin
  mtext("Year", side=1, line=2, font=2) # out a bit with bold font
  mtext("Year", side=1, line=3, font=3) # out even more with italics
  mtext("GDP" , side=2, line=3.5, cex=2) # huge letter

  # You can also add text with text(), go to `?text` for more options
  
  plot(data[,c("time","GDPC1")], type="l", axes=F, xlab="", ylab="", main = "Text (in plot)")
  # lines
  axis(1, las=1, at=seq(1950,2020,10))
  axis(1, las=1, at=seq(1955,2015,10), tck=-.01, lwd.tick=1, lwd=0, labels=F)
  axis(2, las=2)
  text(1980,5000, "USA")
  text(1980,seq(5,0,-1)*1000, "USA", cex=6:1) # what happens? dissapears

  # You cannot plot stuff outside the "box"
  plot(data[,c("time","GDPC1")], type="l", axes=F, xlab="", ylab="", main = "Outside the box") 
  box(col="red", lwd=5)
  
  # Unless... you really want to
  par(xpd=TRUE)
  text(1980,seq(5,0,-1)*1000, "USA", cex=6:1, col="blue") # what happens? dissapears

  par(xpd=FALSE) # Turn this off for subsequent plots 

  # Perhaps more useful.. you may want to emphasize GDP each decade in the plot
  par(mar=c(4,5,2,2)) 
  plot(data[,c("time","GDPC1")], type="l", lwd=2, col="blue", axes=F, xlab="", ylab="",
       main = "Nice plot") 
  axis(1, las=1, at=seq(1950,2020,10))
  axis(1, las=1, at=seq(1955,2015,10), tck=-.01, lwd.tick=1, lwd=0, labels=F)
  axis(2, las=2)

mtext allows you to add text on the MARGINS for labels.

text allows for placing of text inside the graph, just have to specify where.

Create a box around the plot by calling the command after generating the plot.

What happens when you want to plot outside the box?? Have to REALLY specify it. Set par(xpd=TRUE) to put it outside the plotting box. When working with parameters, have to turn it on/off

Add text….

  1. find a way to get the GDP each decade automatically

Command which gives you TRUE/FALSE. Applying idx <- which(data$time %in% seq(1950,2010,10)) creates an index for each year on those select years.

  idx <- which(data$time %in% seq(1950,2010,10))  # position in the dataset
  data$time[idx]
  data$GDPC1[idx]
  plot(data[,c("time","GDPC1")], type="l", axes=T, xlab="", ylab="", main = "Points") 
  points(data$time[idx],data$GDPC1[idx], col="blue") # hollow circle
  
  #Go to `?points` to see how to specify points
  
  points(data$time[idx],data$GDPC1[idx], col="blue", pch=16) # hollow circle
  
  text(data$time[idx]+4,data$GDPC1[idx]-400, round(data$GDPC1[idx]), col="blue") # hollow circle

pch argument in points lets you choose what you want your points to look like.

The text(data$time[idx]+4,data$GDPC1[idx]-400, round(data$GDPC1[idx]), col="blue") command puts labels. Adding numbers when specifying the GDP and time vars puts them to the side of the points specified.

  # Add legend, see `?legend` for more options

  plot(data[,c("time","GDPC1")], type="l", main = "Legends")
  legend("topleft", c("GDP"), lwd=1, pch=16, col="blue")  

  legend("topleft", c("GDP"), lwd=1, pch=16, col="blue", inset=-.05)  

  legend("bottomright", c("GDP"), lwd=1, pch=16, col="blue", bty="n", title="Legend:")
 
  legend("bottomright", c("GDP"), lwd=1, pch=16, col="blue", title="Legend:", bg="cornsilk", inset=.2)

  # Margins
  #go to `?par` to understand what you can do with all of par functionality 
  
  par(mar=c(5,5,5,5)) # large margin
  plot(data[,c("time","GDPC1")], type="l", main = "Large Margin") 

  par(mar=c(0,0,0,0)) # no margin
  plot(data[,c("time","GDPC1")], type="l", main = "No margin") 

  par(mar=c(2,3,1,1)) # tight margin
  plot(data[,c("time","GDPC1")], type="l", main = "Tight margin")

  par(mar=c(5, 4, 4, 2) + 0.1)

Make sure you reset the margin to the original plot margins after changing the margins drastically! The original/classic margin is set at par(mar=c(5, 4, 4, 2) + 0.1)

3. Writing to the disk

Up to now figures have been plotted in the RStudio device. To write it to a file on disk, you have to explicitly tell R to do so. Make sure to set type="windows" for png because working on a Windows computer.

  #To see what png command does, go to `?png`

  # Here is how you start writing a PNG file to the working directory  
  png("myfirstfig.png", width=600, height=800, pointsize=20, bg="transparent", type="windows")
   
  # Get a figure that you want
  par(mar=c(4,5,2,2)) # tight margin
  plot(data[,c("time","GDPC1")], type="l", lwd=3, col="blue", axes=F, xlab="", ylab="") # lines
  axis(1, las=1, at=seq(1950,2020,10))
  axis(1, las=1, at=seq(1955,2015,10), tck=-.01, lwd.tick=1, lwd=0, labels=F)
  axis(2, las=2) 
  abline(v=seq(1950,2020,10), lty=2, lwd=.5)
  abline(h=seq(5000,15000,5000), lty=2, lwd=.5)
  box()
  mtext("Gross Domestic Product", side=2, line=4)
  mtext("Year", side=1, line=2.5)
  
  # Once you are done you have to "close off" the device, so that R 
  # stops writing stuff to the image.
  
  dev.off()

GO check it out!

You can tweak many things… Try the following: - pointsize = 20 - bg = “transparent” or any color - changing width and height - type = “Xlib”

# Compute quarter to quarter growth rate, here is an example
  
  x <- c(1:10)^2
  
  diff(x) # note this is shorter than x (by 1)
## [1]  3  5  7  9 11 13 15 17 19
  diff(log(x)) # growth rate in log points
## [1] 1.3862944 0.8109302 0.5753641 0.4462871 0.3646431 0.3083014 0.2670628
## [8] 0.2355661 0.2107210
# For help with diff, go to ?diff

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Exercise 1: Compute the quarterly growth rate (in log points) of the US economy and store it in the “data” dataframe as the variable “growth”.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

You write c(NA,...) because you’re specifying that the growth rate is calculated by the column vector not the row vector.

data$growth <- c(NA, diff(log(data$GDPC1)))

plot(data$time, data$growth, type="h", main = "Exercise 1")

4. Barplots

beside = T/F means that if false, the columns of height are portrayed as stacked bars, and if TRUE the columns are portrayed as juxtaposed bars.

 # Simple barplot 
  x <- c(1:100)^.5
  barplot(x, main = "Simple barplot")  
  
  # Notice axis starts at 0
  axis(1)

  # Add spaces
  barplot(x, space=c(1), main = "Spaces")  

  # Group bar by group
  xgroup <- matrix(x, ncol=5)
  dim(xgroup)
  
  # Stacked bar plot
  barplot(xgroup, main = "Stacked barplot")

  # Side-by-side bar plot
  barplot(xgroup, beside=T, main = "Side-by-side barplot")

  barplot(xgroup, beside=T, space=c(0,5), main = "Space added") # add more space between groups

  # To make sure labels are added
  colnames(xgroup) <- letters[1:ncol(xgroup)]
  
  barplot(xgroup, beside=T, space=c(0,5), main = "Group names") # add more space between groups

  # You can also customize these figures! 
  barplot(xgroup, beside=T, space=c(0,5), axes=F, main = "Axes labels added") 
  # add more space between groups
  axis(2, las=2)
  abline(h=0)
  mtext("Height", side=2, line=2)

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Exercise 2: Plot growth rate (y axis) over time (x axis) as a blue bar for positive quarter and red for negative quarters. Add a reference line for the 0 growth rate. Hint: you may use either plot or barplot, and ifelse() to create a vector of colors.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

barplot(data$growth, xlab = "", ylab = "", col = ifelse(data$growth>0,"blue","red"), axes=F)
abline(h=0, col="green", lwd = 3)
axis(1, at=seq(1950,2020,10)) #not sure why it didn't work
axis(2, las= 2)
mtext("Growth Rate", side = 2, line = 3)
mtext("Time", side = 1, line = 2)

5. Shaded areas

The density argument in rect is what gives it transparency.

  # Shaded areas (rectangles)
  # Adding boxes or shaded areas to plots: rect()
  plot(data[,c("time","GDPC1")], type="l", col="blue", lwd=2, main = "Adding shaded areas")

  rect(xleft=1950, xright=1960, ybottom=2500, ytop=17000)

  rect(xleft=1970, xright=1980, ybottom=2500, ytop=17000, border="red", col="green")

  # Adding some transparency to boxes

  rect(xleft=1990, xright=2000, ybottom=2500, ytop=17000, border="cyan", col="blue", density=25)
  
  #`?adjustcolor`
  
  rect(xleft=2010, xright=2020, ybottom=2500, ytop=17000, col=adjustcolor("red", alpha.f=0.5), border=NA)

  # As many things with R, you can add many boxes at the same time by using vectors
  plot(data[,c("time","GDPC1")], type="l", col="blue", lwd=2, main = "Shading with vectors")
  starts <- seq(1950,2010,20)
  rect(xleft=starts, xright=starts+5, ybottom=0, ytop=20000, col=adjustcolor("black", alpha.f=0.2), border=NA)

By using adjustcolor, can slightly change around the shade of the colors without having to specify a unique color combo.

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Exercise 3: Add a transparent grey shaded area over the previous the figure of the previous exercise over quarters with negative growth.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

This line of code is not fully correct, but I need to check with Ariel to see what I’m doing wrong.

  barplot(data$growth, xlab = "", ylab = "", col = ifelse(data$growth>0,"blue","red"), axes=T)
  abline(h=0, col="green", lwd = 3) 
  starts <-data$time[data$growth<0]
  step <- diff(data$time)[1]/2
  rect(xleft = starts - step, xright = starts + step, ybottom = -1, ytop = 1, 
       col=adjustcolor("black", alpha.f=0.2), border=NA)

Ask Ariel about what the %*% means

  # Shaded areas (polygons)
  
  # Let's create some fake data
  nobs <- 100
  x <- runif(nobs,0,10)
  xmat <- cbind(1,x,x^2,x^3) # intercept plus x, x^2, x^3
  beta <- cbind(c(20,3,-1.5,.11)) # intercept plus coefficients
  error <- 5 * rnorm(nobs)
  y <- xmat %*% beta + error
  
  # Scatter plot
  plot(x,y, xlim=c(0,10), main = "Regression with fitted line & errors")
  
  # Run regression
  reg <- lm(y ~ xmat[,-1] ) # drop intercept from matrix
  reg <- lm(y ~ x + I(x^2) + I(x^3) ) # equivalent
  
  # Fitted values
  yhat <- fitted(reg)
  
  # Sort x
  idx <- order(x)
  lines(x[idx],yhat[idx], col="red", lwd=2)
    
  # To plot error band we need to compute the variance of Yhat
  var.yhat <- xmat %*% vcov(reg) %*% t(xmat)
  se.yhat   <- sqrt(diag(var.yhat))
  
  # Upper and lower limit, 95% confidence band
  upper <- yhat + 1.96 * se.yhat
  lower <- yhat - 1.96 * se.yhat
  
  # Adding lines
  lines(x[idx],upper[idx], col="red", lty=2)
  lines(x[idx],lower[idx], col="red", lty=2)

  # Add shaded area using polygon, go to ?polygon for more info
  
  plot(x[idx],yhat[idx], col="red", lwd=2, type="l",ylim=range(y), 
       main = "Regression with polygon")
  
  polygon(x=c(x[idx],rev(x[idx])), y=c(upper[idx],rev(lower[idx])))

  plot(x[idx],yhat[idx], col="red", lwd=2, type="l",ylim=range(y), xlab="x", ylab="y", 
       main = "Regression with shaded polygon")
  
  polygon(x=c(x[idx],rev(x[idx])), y=c(upper[idx],rev(lower[idx])), col=adjustcolor("red",alpha.f=.25), border=NA)

  points(x,y, pch=21, col="black", bg="lightgrey")

Space for notes

6. Colors

  # Let's create another fake dataset to highlight how to deal with colors
  nobs <- 10^4
  x <- runif(nobs,0,100)
  
  # Will add colors to these dots based on the level of x
  plot(x,1:nobs, pch=1, main = "Plain nobs")

  # Changing colors
  plot(x,1:nobs, pch=1, col="red", main = "Red nobs")

  plot(x,1:nobs, pch=1, col=ifelse(x>50,"red","grey"), main = "Red/Grey nobs")

  # Using pre-packaged colors
  colors()
  
  cols <- sample(colors(), 5) # 10 random colors
  
  # Let's colorize these points based on whether they fall in the interval
  # 0-20, 20-40, ... 80-100.
  class <-  ifelse(x<20,1,2) # 1 for those below 20, 2 otherwise
  class <-  ifelse(x<20,1,ifelse(x<40,2,3)) # 1 for those below 20, 2 below 40, 3 otherwise.
  class <-  ifelse(x<20,1,ifelse(x<40,2,ifelse(x<60,3,ifelse(x<80,4,5)))) # and so on
  
  # Plot
  plot(x,1:nobs, pch=1, col=cols[class], main = "Random colors by interval")

  # What if you have many more classes, writing ifelse statements if a bit tedious
  # You can use a built in function in R
  class2 <- findInterval(x, seq(0,100,20))

  plot(x,1:nobs, pch=1, col=cols[class2], main = "Random colors by interval, alternate code")

  # Use nice color schemes from R Color Brewer package
  install.packages("RColorBrewer", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/gemma/Documents/R/win-library/4.0'
## (as 'lib' is unspecified)
  library(RColorBrewer)
  
  # ?brewer.pal takes you to the description of the color palette
  
  display.brewer.all()

Place to take notes

  # Colorize with the Spectral color scale
  cols <- brewer.pal(11, "Spectral") # SPectral scale has only 11 colors max
  
  class3 <- findInterval(x, seq(0,100, length.out=length(cols)))

  plot(x,1:nobs, pch=1, col=cols[class3], main = "Brewer Spectral Palette")

  # Making up your own colors: what if you want a finer color scale?
  
  # You can interpolate colors! Go to ?colorRampPalette
  
  manycols <- colorRampPalette(cols)(100)
  
  class4 <- findInterval(x, seq(0,100, length.out=length(manycols)))
  
  plot(x,1:nobs, pch=1, col=manycols[class4], main = "Interpolated colors")

7. Boxplots

Boxplots are useful to represent variation across classes. Go to ?boxplot for help/more info.

  # Generate some fake data
  x <- c(1:5,4:1,2:10,9:5)
  x <- matrix(x, ncol=length(x), nrow=100, byrow=T)
  x <- x + rnorm(length(c(x))) * c(x)/5
  colnames(x) <- letters[1:ncol(x)]  
  
  # View data a bit
  head(x)
  
  # Basic boxplot
  boxplot(x, main = "Basic boxplot") 

  boxplot(x, width=seq(1,2,length.out=ncol(x)), main = "Narrower boxes") 

  boxplot(x, notch=T, main = "notch = T") 

  boxplot(x, boxwex=1, main = "Adjust box width (boxwex)") 

  boxplot(x, horizontal=T, main = "Horizontal boxplot") 
  
  abline(h=18.5) # you can use abline to add divisions

  # Customize a bit
  boxplot(x, axes=F, col=NA, border=NA, main = "Customized boxplot") # empty plot
  axis(1, colnames(x), at=1:ncol(x))
  axis(2, las=2)
  abline(h=seq(0,15,5), col="grey")
  legend("topleft", "Outliers",pch="*",col="darkred", inset=.05, pt.cex=2)
  boxplot(x, axes=F, col="coral", border="darkred", lty=1, pch="*", cex=1.5, add=T)
  box()

8. Simple maps

  # Make sure packages are installed
  install.packages("maps", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/gemma/Documents/R/win-library/4.0'
## (as 'lib' is unspecified)
  install.packages("mapproj", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/gemma/Documents/R/win-library/4.0'
## (as 'lib' is unspecified)
  library("maps")
  library("mapproj")
  # This package has a bunch of pre-packages maps
  
  map("world")

  map("usa")

  map("state")

  map("state", fill=T, col=sample(colors(),48))

  # You can combine them  
  map("world")
  map("state", fill=T, col=sample(colors(),48), add=T)

  map("state")
  map("county", add=T)

  map("state")
  data(us.cities)
  map.cities(us.cities, col="darkred", bg="coral", pch=21)

  # You can change the projection
  map("state")

  library(mapproj)
  map("state", project="lambert",par=c(30,40)) # projections require parameters

  map("world")

  map("world",proj="orthographic",orientation=c(15,255,0)) 

  map("world",proj="orthographic",orientation=c(15,260,0)) 
  map("state",proj="orthographic",orientation=c(15,260,0), add=T) 

Every state contains its own FIPS code (ID) and county level ID too.

For assignment: Create histogram to look at breaks and classes (ranges), then define breaks. Then add abline to see the level of breaks if they match the data. Then create legend to define the breaks (leg.txt). Look at the one the NYT uses.

For colors: need n-1 colors from the breaks colors <- colorRampPalette(brewer.pal(9, "Reds"))(length(breaks)-1) If need more colors, can interpolate using colorRampPalette.

cut function tells you what interval each data falls into the defined break buckets. Turns the function into a factor class. Factors are good for categorical variables. (Also can create dummy vars when running regressions.)

Doing as.Numeric before cut forces the levels to drop and creates an indicator variable. Shows us which county falls into which bucket.

Make sure to add fill = TRUE when mapping colors.

Have to match FIPS code with the county names in the map.

Use county.fips$polyname to match with the county names in the map. Then match again with the color buckets in the unemployment data.

  #You can add colors by specifying a vector of color of the same length and order
  # than the location names in the map
  data(unemp) # load economic data
  head(unemp) # contains county IDs (FIPS code)
  
  # define color buckets
  breaks <- c(0, 2, 4, 6, 8, 10, 12, 100)
  leg.txt <- c("<2%", "2-4%", "4-6%", "6-8%", "8-10%", "10-12%", ">12%")
  colors <- colorRampPalette(brewer.pal(9, "Reds"))(length(breaks)-1)
  unemp$colorBuckets <- as.numeric(cut(unemp$unemp, breaks))
  
  # Align data with map definitions by (partial) matching state,county
  # names, which include multiple polygons for some counties
  cnty.fips <- county.fips$fips[match(map("county", plot=FALSE)$names, county.fips$polyname)]
  colorsmatched <- unemp$colorBuckets [match(cnty.fips, unemp$fips)]
  
  # draw map
  map("county", col = colors[colorsmatched], fill = TRUE,  lty = 0, projection = "polyconic")
  map("state", col = "white", fill = FALSE, add = TRUE, lty = 1, lwd = 1, projection="polyconic")
  title("Unemployment by county, 2009")
  legend("topright", leg.txt, horiz = TRUE, fill = colors)

9. Animations

  # Basic map
  o <- c(15,260,0)
  map("world", proj="orthographic",orientation=o) 
  map("county",proj="orthographic",orientation=o, col = colors[colorsmatched], fill = TRUE, resolution = 0, lty = 0, add=T)
  map("state",proj="orthographic",orientation=o, add=T, lwd=.5) 

  # Make it turn by changing orientation
  o <- c(15,260,0) + c(0,-5,0) # run the code above again

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Exercise 4: Create a series of figures that make the planet rotate a bit. Write these figures to the disk. Hint: 1- write a loop where the central parameter runs from 200 to 300.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

  install.packages("magick")
  install.packages("animation") 
  library(magick)
  library(animation)
  
  rotate.map <- function(angles, write=F) {
  for (i in angles) {
    print(i)
    number <- substr(match(i,rev(angles))+1000,2,5) 
      # this ensures the planet rotates in the right direction
    if (write) png(paste("map_",number,".png",sep=""),600,600) 
    par(mar=c(0,0,0,0))
    o <- c(15,i,0) 
    map("world", proj="orthographic",orientation=o) 
    map("county",proj="orthographic",orientation=o, col = colors[colorsmatched], fill = TRUE, resolution = 0, lty = 0, add=T)
    map("state",proj="orthographic",orientation=o, add=T, lwd=.5) 
    if (write) dev.off()
  }
}
rotate.map(200:300)

  ani.options(interval = 0.1, nmax = 100) # set options
  
  # Write to disk 
  saveGIF(rotate.map(300:200), movie.name = "my_rotating_world2.gif", 
          convert = "magick", clean = TRUE)

10. Multi-panel plots

 # Let's create a function that generates a graph
  n <- 10^4
  f <- function(n)  plot(runif(n), col=colorRampPalette(brewer.pal(11,"Spectral"))(n), pch=16)

  # Check it out
  f(n)

  # The ability to create multi-panel plots can be controlled with par(), go to ?par for help
  
  # Multi panel plots 1: par() 
  par(mfrow=c(3,2)) # fill by row
  lapply(1:6, function(x) {
    f(n)
    text(n/2,.5,x,cex=8)
  })

  par(mfcol=c(3,2)) # fill by column
  lapply(1:6, function(x) {
    f(n)
    text(n/2,.5,x,cex=8)
  })

  par(mfcol=c(3,2), mar=c(0,0,0,0)) # no margins !
  lapply(1:6, function(x) {
    f(n)
    text(n/2,.5,x,cex=8)
  })

  par(mfcol=c(3,2), mar=c(0,0,0,0), oma=c(6,6,6,6)) # add some outer margins
  lapply(1:6, function(x) {
    f(n)
    text(n/2,.5,x,cex=8)
  })
  
  #  Add some text outside
  mtext("X label for the entire figure", side=1, outer=T, line=2.5)
  mtext("Y label for the entire figure", side=2, outer=T, line=2.5)
  mtext("Overall Title", side=3, outer=T, line=2.5, cex=2)

Multi panel plots 2: layout() Provides a bit more flexibility, quicker than using the above command (in terms of loading/rendering).

  # Create a "layout" matrix
  par(mfcol=c(3,2), mar=c(0,0,0,0), oma=c(6,6,6,6)) # add some outer margins
  lmat <- matrix(c(1,2,3,4,5,5), ncol=2, byrow=T)
  layout(lmat, widths=c(1,1), heights=c(1,1,2))  
  lapply(1:5, function(x) {
    f(n)
    text(n/2,.5,x,cex=8)
  })

Question for Ariel: Can you add titles to those plots? Or will it have to be by using the mtext option?

11. Custom plots: e.g. time series grid

Download state-level GDP data for the USA 1. go to https://apps.bea.gov/regional/downloadzip.cfm 2. Select “Gross Domestic Product (GDP)” then “Annual GDP by State” 3. Place files in “SAGDP” folder 4. Select files for 1997-2017 only for simplicity

  data <- lapply(state.abb, function(state) {
    
    print(state)
    # File name
    fname <- paste("SAGDP/SAGDP9N_",state,"_1997_2017.csv",sep="")
    # Import file
    d <- read.csv(fname, stringsAsFactors=F)
    # Select desired variables
    d2 <- as.numeric(d[1,paste("X",1997:2017,sep="")])
    d2 <- data.frame(state,data.frame(t(d2)))
    names(d2)[-1] <- 1997:2017
    # Export
    d2
  })
  data <- do.call("rbind", data)

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Exercise 5: Compute annual growth rate for each state in 1 line of code
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

  growth <- t(apply(data[,-1],1, function(x) diff(log(x)) ))
  colnames(growth) <- 1998:2017
  rownames(growth) <- data[,1]

What does t(apply(..)) do?

  # Heat map
  image(t(growth))

  # Customize heat map
  years <- 1998:2017
  breaks <- c(-2,seq(-.1,.1,.02),2)
  colors <- colorRampPalette(brewer.pal(10,"RdBu"))(length(breaks)-1)
  image(t(growth), breaks=breaks, col=colors, axes=F)
  axis(1, at=seq(0,1,length.out=length(years))    , labels=seq(years[1],years[length(years)],1))
  mtext(state.abb, side=2, at=seq(0,1,length.out=length(state.abb)), las=2, cex=.8, line=.5)
  mtext("GDP growth rates by US state", side=3, font=2, line=1)
  box()

  # Confirm Alaska starts the time series with negative growth
  growth["AK",]
  
  # Sort growth data post 2008
  meang <- apply(growth[,paste(2010:2017)], 1, function(x) mean(x))
  
  # Customize heat map
  image(t(growth[order(meang),]), breaks=breaks, col=colors, axes=F)
  axis(1, at=seq(0,1,length.out=length(years))    , labels=seq(years[1],years[length(years)],1))
  mtext(state.abb[order(meang)], side=2, at=seq(0,1,length.out=length(state.abb)), las=2, cex=.8, line=.5)
  mtext("GDP growth rates by US state", side=3, font=2, line=1, cex=1)
  mtext("(sorted by post-recession growth rates)", side=3, font=3, line=0)
  box()

  # Make it a plot with dots
  bucketmat <- growth
  bucketmat[] <- findInterval(growth, breaks) 
  
  # Plot
  plot(1, ylim=c(1,length(state.abb)), xlim=range(years), axes=F, xlab="", ylab="")
  axis(1)  
  axis(2, at=1:length(state.abb), labels=state.abb, las=2)  
  lapply(years, function(y) {
    lapply(state.abb, function(s) {
      points(y,match(s,state.abb), col=colors[bucketmat[s,paste(y)]])
    })
  })  

  # Change symbol and improve colors
  plot(1, ylim=c(1,length(state.abb)), xlim=range(years), axes=F, xlab="", ylab="")
  axis(1)  
  axis(2, at=1:length(state.abb), labels=state.abb, las=2)  
  lapply(years, function(y) {
    lapply(state.abb, function(s) {
      points(y,match(s,state.abb), bg=colors[bucketmat[s,paste(y)]], pch=ifelse(growth[s,paste(y)]>0,24,25), cex=1.5, lwd=1, col="darkgrey")
    })
  })  

Place for notes

12. Writing high-quality files to the disk

You’ll soon realize that the figures you first generate in your papers and projects may not have the required resolution or format for certain journals.

There are two main types of figure formats: bitmap and vector

Bitmap are stored as “pixels” on a regular grid. The resolution of a bitmap image reflects the number of pixels per inch. Low resolution allows the human eye to see “pixels” or pixellation. This is obviously undesirable so journals may ask you to create higher resolution figures.

In some cases journals do not accept bitmap images for figures that can be written as a vector file. A vector file is stored as a series of vectors and polygons. This representation does not have a “resolution” so vector files can be printed in larger sizes without creating the pixellation.

When do you decided which one to do? Depends at what stage in your project you are. Early on, it might be easier to create bitmap images that can be easily included in your word processor.

However, once your paper is accepted for publication, the journal may ask you to either increase the resolution or change the format. This can be time-consuming, so make sure you think about this potential change ahead of time.

# Bitmap example with png(), go to ?png for help

  plot(1,1,pch="A", cex=15)
  
  # Write to disk with a low resolution
  png("bitmap_1.png", width=500, height = 500, res=50)
  plot(1,1,pch="A", cex=15)
  dev.off()
  
  # Increase resolution
  png("bitmap_2.png", width=500, height = 500, res=50*4)
  plot(1,1,pch="A", cex=15)
  dev.off()
  
  # Increase resolution + increase image size
  png("bitmap_3.png", width=500*4, height = 500*4, res=50*4)
  plot(1,1,pch="A", cex=15)
  dev.off()
  
  # Other bitmap formats: jpeg, tiff, bmp
  jpeg("bitmap_4.jpg", width=500*4, height = 500*4, res=50*4)
  plot(1,1,pch="A", cex=15)
  dev.off()

If you zoom in you still see pixelation Because of this, some journals may ask for a vector file

  # Vector example with:  PDF with cairo_pdf()

  ?cairo_pdf
  
  # Create a 7 x 7 inches PDF
  cairo_pdf("vector_1.pdf", width=7, height = 7)
  plot(1,1,pch="A", cex=15)
  dev.off()
  
  # Now zoom in... Can you see pixels?
  
  # Other vector formats: postscript with cairo_ps()
  cairo_ps("vector_2.ps", width=7, height = 7)
  plot(1,1,pch="A", cex=15)
  dev.off()

THE END